通过Java SDK实现企业知识库问答

本文将以Java SDK为例,介绍如何通过SDK上传数据,快速实现企业知识库问答。对于ADD(增加数据)操作,会直接覆盖相同id文档的数据,实现数据UPDATE。因此,对于大批量数据的增、删、改,以及数据的定时同步与更新,建议通过API/SDK操作。

前提条件

  1. 请确保您已经获取RAM用户的AccessKey ID以及AccessKey Secret,用于作为调用SDK的凭证。

    1. 创建RAM用户并授权

    2. 查看RAM用户的AccessKey信息

    说明

    AccessKey Secret只在创建时显示,不支持查看。

  2. 请确保代码运行环境设置了环境变量ALIBABA_CLOUD_ACCESS_KEY_IDALIBABA_CLOUD_ACCESS_KEY_SECRET。具体配置方法,请参考:Linux、macOSWindows系统配置环境变量

安装所需依赖

本文以Maven工程为例,要在Maven工程中使用OpenSearch Java SDK,只需在pom.xml中加入相应依赖即可。

<dependency>
    <groupId>com.aliyun.opensearch</groupId>
    <artifactId>aliyun-sdk-opensearch</artifactId>
    <version>6.0.0</version>
</dependency>
<dependency>
    <groupId>com.aliyun</groupId>
    <artifactId>aliyun-java-sdk-core</artifactId>
    <version>4.6.0</version>
</dependency
<dependency>
    <groupId>com.alibaba</groupId>
    <artifactId>fastjson</artifactId>
    <version>1.2.76</version>
</dependency

创建实例

您首先需要创建一个OpenSearch-LLM智能问答版实例,请参考:创建LLM智能问答版实例

企业知识库配置

现在您已经创建好一个智能问答版实例,接下来需要上传企业相关知识。您可以根据数据类型进行结构化数据、非结构化数据以及网站推送。

  • 文档导入

    通过以下示例代码可以完成单条或多条结构化数据的导入:

    import java.util.HashMap;
    import java.util.Map;
    
    import com.alibaba.fastjson.JSONArray;
    import com.alibaba.fastjson.JSONObject;
    
    import com.aliyun.opensearch.OpenSearchClient;
    import com.aliyun.opensearch.sdk.generated.OpenSearch;
    import com.aliyun.opensearch.sdk.generated.commons.OpenSearchClientException;
    import com.aliyun.opensearch.sdk.generated.commons.OpenSearchException;
    import com.aliyun.opensearch.sdk.generated.commons.OpenSearchResult;
    
    /**
     * 结构化数据添加
     */
    public class testPushDemo {
    
        private static String appName = "test";
        //填入您的实例名称
        private static String host = "http://opensearch-cn-shanghai.aliyuncs.com";
        //流量服务接入地址
        private static String path = "/apps/%s/actions/knowledge-bulk";
        //API接口
    
        public static void main(String[] args) {
    
            String appPath = String.format(path, appName);
            //用户识别信息
            //从环境变量读取配置的AccessKey IDAccessKey Secret,运行代码示例前必须先配置环境变量
            String accesskey = System.getenv("ALIBABA_CLOUD_ACCESS_KEY_ID");
            String secret = System.getenv("ALIBABA_CLOUD_ACCESS_KEY_SECRET");
    
            //创建并构造OpenSearch对象
            OpenSearch openSearch = new OpenSearch(accesskey, secret, host);
            //创建OpenSearchClient对象,并以OpenSearch对象作为构造参数
            OpenSearchClient openSearchClient = new OpenSearchClient(openSearch);
    
            //单个结构化文档构建
            JSONObject oneRequest = new JSONObject();
            oneRequest.put("cmd", "ADD");
            JSONObject fields = new JSONObject();
            fields.put("id", "1");
            //(必填)文档ID,唯一不重复
            fields.put("title", "产品优势");
            //(选填)文档标题
            fields.put("url", "https://help.aliyun.com/document_detail/464900.html");
            //(选填)文档url链接
            fields.put("content", "行业算法版智能内置丰富的定制化算法模型,并结合不同行业搜索特点,推出行业召回、" +
                    "排序算法,保障更优搜索效果。灵活、可定制开发者可基于自身业务特性与数据,定制相应的算法模型、应用结构、" +
                    "数据处理、查询分析、排序等配置,满足个性化搜索需求,提升搜索结果点击率,实现业务快速迭代,极大缩短需求上线的周期。" +
                    "安全、稳定提供7×24小时的运行维护,并以在线工单和电话报障等方式提供技术支持,具备完善的故障监控、自动告警、" +
                    "快速定位等一系列故障应急响应机制。");
            //(必填)文档内容
            fields.put("category", "OpenSearch,行业算法版");
            //(选填)文档类目
            fields.put("timestamp", "1720668785888");
            //(选填)时间戳,文档时间新鲜度
            oneRequest.put("fields", fields);
    
            //可以同时添加多条数据
            JSONArray request = new JSONArray();
            request.add(oneRequest);
            //request.add(twoRequest);
    
    
            Map<String, String> params = new HashMap<String, String>() {{
                put("format", "full_json");
                put("_POST_BODY", request.toJSONString());
            }};
            try {
                OpenSearchResult openSearchResult = openSearchClient.callAndDecodeResult(appPath, params, "POST");
                //打印返回结果
                System.out.println(openSearchResult.getResult());
            } catch (OpenSearchException e) {
                e.printStackTrace();
            } catch (OpenSearchClientException e) {
                e.printStackTrace();
            }
        }
    }

    通过以下示例代码可以完成单条或多条非结构化数据(支持doc、docx、pdf、html、txt、ppt、pptx格式)的导入:

    import java.io.IOException;
    import java.nio.file.Files;
    import java.nio.file.Path;
    import java.nio.file.Paths;
    import java.util.Base64;
    import java.util.HashMap;
    import java.util.Map;
    
    import com.alibaba.fastjson.JSONArray;
    import com.alibaba.fastjson.JSONObject;
    
    import com.aliyun.opensearch.OpenSearchClient;
    import com.aliyun.opensearch.sdk.generated.OpenSearch;
    import com.aliyun.opensearch.sdk.generated.commons.OpenSearchClientException;
    import com.aliyun.opensearch.sdk.generated.commons.OpenSearchException;
    import com.aliyun.opensearch.sdk.generated.commons.OpenSearchResult;
    
    
    public class PushNonStructuralLLM {
        private static String appName = "test";
        //填入您的实例名称
        private static String host = "http://opensearch-cn-shanghai.aliyuncs.com";
        //流量服务接入地址
        private static String path = "/apps/%s/actions/knowledge-bulk";
        //API接口
    
        public static void main(String[] args) throws IOException {
            //用户识别信息
            //从环境变量读取配置的AccessKey IDAccessKey Secret,运行代码示例前必须先配置环境变量
            String accesskey = System.getenv("ALIBABA_CLOUD_ACCESS_KEY_ID");
            String secret = System.getenv("ALIBABA_CLOUD_ACCESS_KEY_SECRET");
    
            String appPath = String.format(path, appName);
    
            //创建并构造OpenSearch对象
            OpenSearch openSearch = new OpenSearch(accesskey, secret, host);
            //创建OpenSearchClient对象,并以OpenSearch对象作为构造参数
            OpenSearchClient openSearchClient = new OpenSearchClient(openSearch);
    
            //单个doc构建
            Path path = Paths.get("/Users/xxx/Documents/示例企业知识库.docx");
            JSONObject oneRequest = new JSONObject();
            oneRequest.put("cmd", "BASE64");
            //上传非结构化文档(doc、docx、pdf、html、txt、ppt、pptx),cmdBASE64
            JSONObject fields = new JSONObject();
            fields.put("id", "2");
            //文档ID,唯一不重复。
            fields.put("title", "示例企业知识库.docx");
            //(必填)带后缀的文件名
            fields.put("url", "https://help.aliyun.com/document_detail/464900.html");
            //(选填)文档链接
            fields.put("content", Base64.getEncoder().encodeToString(Files.readAllBytes(path)));
            //(必填)文档内容
            fields.put("category", "OpenSearch,智能问答版");
            //(必填)文档类目
            fields.put("timestamp", "1720668785888");
            //(选填)文档时间新鲜度
            oneRequest.put("fields",fields);
    
            //可以同时添加多条数据
            final JSONArray request = new JSONArray();
            request.add(oneRequest);
            //request.add(twoRequest);
    
            Map<String, String> params = new HashMap<String, String>() {{
                put("format", "full_json");
                put("_POST_BODY", request.toString());
            }};
            try {
                OpenSearchResult openSearchResult = openSearchClient.callAndDecodeResult(appPath, params, "POST");
                //打印返回结果
                System.out.println(openSearchResult.getResult());
            } catch (OpenSearchException e) {
                e.printStackTrace();
            } catch (OpenSearchClientException e) {
                e.printStackTrace();
            }
        }
    
    }
    说明

    批量推送文档个数不能太大,不能超过我们规定限制,否则可能会导致推送报错。

    API详情请参考:PushKnowledgeDocuments-文档推送

  • 网站导入

    通过以下示例代码可以完成网站导入任务:

    import com.aliyuncs.CommonRequest;
    import com.aliyuncs.CommonResponse;
    import com.aliyuncs.DefaultAcsClient;
    import com.aliyuncs.IAcsClient;
    import com.aliyuncs.exceptions.ClientException;
    import com.aliyuncs.exceptions.ServerException;
    import com.aliyuncs.http.FormatType;
    import com.aliyuncs.http.MethodType;
    import com.aliyuncs.http.ProtocolType;
    import com.aliyuncs.profile.DefaultProfile;
    
    
    /**
     * 网站导入
     */
    public class CreateSpider {
        private static String appName = "test";
        //填入您的实例名称
        private static String path = "/v4/openapi/app-groups/%s/chatos/spiders";
        //API接口
        public static void main(String[] args) {
            String appPath = String.format(path, appName);
    
            // Please ensure that the environment variables ALIBABA_CLOUD_ACCESS_KEY_ID and ALIBABA_CLOUD_ACCESS_KEY_SECRET are set.
            DefaultProfile profile = DefaultProfile.getProfile("cn-shanghai", System.getenv("ALIBABA_CLOUD_ACCESS_KEY_ID"), System.getenv("ALIBABA_CLOUD_ACCESS_KEY_SECRET"));
            /** use STS Token
             DefaultProfile profile = DefaultProfile.getProfile(
             "<your-region-id>",           // The region ID
             System.getenv("ALIBABA_CLOUD_ACCESS_KEY_ID"),       // The AccessKey ID of the RAM account
             System.getenv("ALIBABA_CLOUD_ACCESS_KEY_SECRET"),   // The AccessKey Secret of the RAM account
             System.getenv("ALIBABA_CLOUD_SECURITY_TOKEN"));     // STS Token
             **/
    
            IAcsClient client = new DefaultAcsClient(profile);
    
            CommonRequest request = new CommonRequest();
    
            //request.setProtocol(ProtocolType.HTTPS);
            request.setMethod(MethodType.POST);
            request.setDomain("opensearch.cn-shanghai.aliyuncs.com");
            request.setVersion("2017-12-25");
            request.setUriPattern(appPath);
            String requestBody = "" +
                    "{\"url\":\"https://help.aliyun.com/zh/open-search/product-overview\",\"category\":\"opensearch帮助文档\"}";
            request.putHeadParameter("Content-Type", "application/json");
            request.setHttpContent(requestBody.getBytes(), "utf-8", FormatType.JSON);
            try {
                CommonResponse response = client.getCommonResponse(request);
                System.out.println(response.getData());
            } catch (ServerException e) {
                e.printStackTrace();
            } catch (ClientException e) {
                e.printStackTrace();
            }
        }
    }
    说明

    API详情请参考:CreateSpider-新增网站导入任务

效果测试

此时您已经构建了企业专属数据库,可以通过以下示例代码对企业知识库的问答效果进行测试:

import com.alibaba.fastjson.JSONArray;
import com.alibaba.fastjson.JSONObject;
import com.aliyun.opensearch.OpenSearchClient;
import com.aliyun.opensearch.sdk.generated.OpenSearch;
import com.aliyun.opensearch.sdk.generated.commons.OpenSearchClientException;
import com.aliyun.opensearch.sdk.generated.commons.OpenSearchException;
import com.aliyun.opensearch.sdk.generated.commons.OpenSearchResult;

import java.util.HashMap;
import java.util.Map;

public class LLMsearch {
    private static String appName = "proLLM";
    //填入您的实例名称
    private static String host = "http://opensearch-cn-shanghai.aliyuncs.com";
    //流量服务接入地址
    private static String path = "/apps/%s/actions/knowledge-search";
    //API接口

    public static void main(String[] args) {
        String appPath = String.format(path, appName);
        //用户识别信息
        //从环境变量读取配置的AccessKey ID和AccessKey Secret,运行代码示例前必须先配置环境变量
        String accesskey = System.getenv("ALIBABA_CLOUD_ACCESS_KEY_ID");
        String secret = System.getenv("ALIBABA_CLOUD_ACCESS_KEY_SECRET");
        //ApiReadTimeOut
        OpenSearch openSearch = new OpenSearch(accesskey, secret, host);
        openSearch.setTimeout(62000);

        OpenSearchClient openSearchClient = new OpenSearchClient(openSearch);


        //单个查询doc构建
        JSONObject oneRequest = new JSONObject();
        JSONObject question = new JSONObject();
        question.put("text", "什么是OpenSearch");
        //写入您要提问的问题
        //question.put("session", "对话的session,设置了之后,会有多轮对话的功能");
        question.put("type", "TEXT");
        oneRequest.put("question", question);

        Map<String, String> params = new HashMap<String, String>() {{
            put("format", "full_json");
            put("_POST_BODY", oneRequest.toJSONString());
        }};
        
        try {
            OpenSearchResult openSearchResult = openSearchClient
                    .callAndDecodeResult(appPath, params, "POST");
            System.out.println("RequestID=" + openSearchResult.getTraceInfo().getRequestId());
            System.out.println(openSearchResult.getResult());
        } catch (
                OpenSearchException e) {
            System.out.println("RequestID=" + e.getRequestId());
            System.out.println("ErrorCode=" + e.getCode());
            System.out.println("ErrorMessage=" + e.getMessage());
        } catch (
                OpenSearchClientException e) {
            System.out.println("ErrorMessage=" + e.getMessage());
        }
    }
}

搜索查询返回的结果:

{"data":[{"reference":[{"tokenNum":141,"id":"c598ea1cf340fdb5a6bea0eb2c90db2a",
        "title":"网站问答_智能开放搜索 OpenSearch(Open Search)-阿里云帮助中心",
        "category":"LLM","url":"https://help.aliyun.com/zh/open-search/" +
        "llm-intelligent-q-a-version/website-q-a?spm=a2c4g.11186623.0.0.496565707tXzDl"},
        {"tokenNum":708,"id":"08d48b6f3fd96b158beca07e9858abc7",
        "title":"智能开放搜索有哪些产品优势_智能开放搜索 OpenSearch(Open Search)-阿里云帮助中心",
        "category":"opensearch",
        "url":"https://help.aliyun.com/zh/open-search/product-overview/benefits"}],
        "answer":"OpenSearch,即智能开放搜索,是阿里云提供的一项服务。它具有以下特点和优势:\n\n" +
        "1. **行业算法版**:内置丰富的定制化算法模型,结合不同行业搜索特点,提供行业召回、排序算法," +
        "以保障更优的搜索效果。\n\n2. **灵活、可定制**:开发者可以根据自身业务特性与数据定制算法模型、" +
        "应用结构、数据处理、查询分析、排序等配置,以满足个性化搜索需求。\n\n3. **安全、稳定**:提供7×24" +
        "小时的运行维护,具备故障监控、自动告警、快速定位等应急响应机制。通过安全加密对保证用户数据安全," +
        "并进行权限控制和隔离。\n\n4. **弹性伸缩**:用户可以根据需要扩展或缩减资源。\n\n5. **丰富的外围" +
        "功能**:支持热搜、底纹、下拉提示、统计报表等搜索外围功能。\n\n6. **开箱即用**:无需运维部署集群," +
        "可快速接入搜索服务。\n\n7. **高性能检索版**:支持高吞吐,单表支持万级别写入TPS,秒级更新。\n\n8." +
        " **向量检索版**:底层稳定,支持海量数据检索和实时更新,提供低成本的索引压缩策略,支持向量算法和SQL查询。" +
        "\n\nOpenSearch提供的服务还包括问答测试、数据推送(如网站导入)、数据查询(如搜索Demo)以及其他功能(" +
        "如文本向量化及切片向量化)。此外,它还提供了产品概述、快速入门、操作指南、实践教程、开发参考、服务支持" +
        "和视频专区等文档资料供用户了解和使用。","type":"TEXT"},
        {"reference":[{"id":"08d48b6f3fd96b15" +
        "8beca07e9858abc7","title":"智能开放搜索有哪些产品优势_智能开放搜索 OpenSearch(Open Search)-阿里云帮助中心",
        "category":"opensearch","url":"https://help.aliyun.com/zh/open-search/product-overview/benefits"}],
        "answer":"https://img.alicdn.com/tfs/TB1AOdINW6qK1RjSZFmXXX0PFXa-258-258.jpg",
        "type":"IMAGE"}]}
说明

您还可以针对具体场景和期望效果设置相应参数,具体请参考:SearchKnowledge-问答文档查询

总结

至此,您已经通过Java SDK实现企业知识库问答,后续只要将OpenSearch相应的接口接入到业务中,就可以支持企业知识库问答。通过构建不同类型的知识库、还能够支持智能文档、电商导购、教育问答等多种多样的场景。